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ABSTRACT 


The past ten years of research on computer vision in the JPL 
Robotios Laboratory have matured into a powerful real-time system 
comprised of standardized commercial hardware, computers, and 
pipeline processing laboratory prototypes, supported by an 
extensive set of image processing algorithms. The software 
system has been constructed to be transportable via the choice of 
a popular high-level language (PASCAL), and a widely used 
oomputer (VAX-11/750). It comprises a whole realm of low-level 
and high-level processing software that has proven to be 
versatile for applications ranging from factory automation to 
space satellite tracking and grappling. 
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1 . INTRODUCTION 


This document describes our computer vision software 
library. The majority of the programs set forth represent the 
software base upon which we build programs to experiment with new 
algorithms and implement programs to demonstrate computer vision 
applications. Also described are some object tracking and 
acquisition programs representing current resear oh (Section 7) and 
our implementations of camera calibration (Section 5). The 
software described here has evolved over varying amounts of time 
reflecting ohanges in hardware and algorithmic refinements based 
on our experience in using the programs. This document should 
be useful to the computer vision research and development 
community at large. 

Most of the vision software was originally developed on a 
General Automation SPC-16/85 minicomputer in FORTRAN and assembly 
language. The majority of these programs have then been trans- 
ferred in 198H to a DEC VAX-1 1/750 computer running the VMS 
operating system. The few remaining programs will be transferred 
in the near future. FORTRAN programs transferred to the VAX have 
been converted to PASCAL and all new high-level software is 
written in PASCAL. Some low-level programs such as interrupt 
handlers and device I/O routines are written in assembly language 
(MACRO) as necessary. 

The choice of a language for program development on the VAX 
was based on the desire to use a language with modern control 
statements (IF- THEN- ELSE, DO-WHILE loops, REPEAT- UNTIL loops, 
CASE statements), structured data types and strong variable 
typing. At the same time, we wanted our software to be 
transportable. Given the current lack of standardization of 
programming languages and operating systems, full 
transportability (i.e., running programs on another oomputer with 


no modification) i3 practical only between two like computers 
running the same operating system. Beyond that, the best that 
oan be hoped for is that programs can be made to run on a 
different computer with minimal changes. In that case, it is 
much easier to modify programs written in a structured language 
3Uoh as PASCAL (as opposed to FORTRAN for example) without 
introducing undesirable side effects that are difficult to 
oorrect. The logioal choice was thus PASCAL which has the 
required language features and is supported by DEC to run under 
VAX/ VMS. 

While strlotly speaking our software is fully transportable 
only to another VAX (11/750 or any other model), the faot that 
PASCAL is supported by most other modern operating systems means 
that transfer to a different computer should requirt minimal 
conversion effort. 

As a final note on languages, it appears highly likely, if 
not certain, that ADA will be in wide use by the end of this 
decade, replacing PASCAL and other high-level languages, 
particularly on large computers such as the VAX. Since ADA is 
based largely on PASCAL, conversion to ADA would be relatively 
painless should we elect to do that in the future. It is 
interesting to note that many of the PASCAL language extensions 
provided by DEC resemble standard features of ADA. This is 
particularly true of the "environment/inherit" interface which 
links separately compiled PASCAL modules in a manner which is 
roughly equivalent to the "package" facility in ADA. 

The following seotions describe the vision system hardware 
and present the vision system software in an order which 
generally proceeds from low-level to high-level. Section 2 
details the essential hardware components of the vision system. 
Section 3 describes low-level routines associated with image I/O 


and hardware control. Section sots forth the imago feature 
extraction algorithms. Section 5 describes the camera 
calibration model and routines associated with calibration. 
Chapter 6 details coordinate transformation routines which use 
the camera calibration model to relate 2-D image points with 3-D 
points in a scene. Section 7 describes several objeot tracking 
algorithms and an automatic object acquisition program for 
tracker initialization. 




2, VISION SYSTEM HARDWARE 

The Robotics Laboratory hardware architecture is illustrated 
in Figure 2-1, The following sootions dosoribo various compo- 
nents. 

2.1 VAX 

The VAX-11/750 has 2M byte of memory, 180M bytes of disk 
storage (160M byte fixed disk and two 10M byte removable 
cartridge disks), 16 serial ports, two DR11-H 16-bit parallel 
ports, and a floating point accelerator. As resources are 
available we plan to add a tape drive, a 450M byte disk drive, 
and additional memory. 

The vision hardwaro is installed or.t ft ttBUS connected to the 
VAX UNIBUS with a bus repeater. The bus repeater is transparent 
at the software leyel so the QBUS peripherals are programmed as 
if they were aotually plugged into the UNIBUS. 

2.2 Cameras 

We are uaing three Hitachi KF-120 CCD cameras, The CCD 
array in these cameras is 244 lines by 320 pixels per line. The 
cameras are run in non-interlaced mode which provides a full 
image every 1/60 second. Two of the oameras are mounted as a 
stereo pair approximately six feet apart and oriented such that 
the oommon field of v.iew is at a range of 6 feet. The third 
camera is mounted midway between the other two, approximately 3 
feet higher. Thi3 arrangement will be used for 3-oamera stereo 
experiments, 

2.3 Digitizer 

The digitizer is a DATACUBE QAF-120 which digitizes at 6.1 
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MHz, the pixel rato of the Hltaohi oaraoras, Tho dlgltlzor oan 
aoloot one of four video inputs to bo digitized. 

2.4 IMFEX 

The digitized video ia input to IMFEX, a pipolino 
prooeoaor doaignod and built at JPL [1], One of sever, IMFEX 
outputs is seleoted to be stored in the frame buffers. These 
outputs include the raw digitized video, one of four stages of an 
edge detector, and two forms of binary images (thresholded 
video). The algorithms built into IMFEX are described in section 
4.3. 


2.5 Frame buffers 

The frame buffers are DATACUBE QVG-120's, designed as 
companion products to the QAF-120 digitizer. In aoquire mode, 
the frame buffers continuously store full images at the video 
rato; i.e. , 1/-V3 .‘^cond per image. In computer access mode, an 
image js firman in the buffer. Programs have random aocess to 
single pixels for read/write operations by specifying the row and 
oolumn address of the desired pixel. For line oriented 
operations, auto-increment may be seleoted after specifying only 
the address of the first pixel, 

2.6 KVH1-P Clock 

The KW11-P is used to generate interrupts at 60 Hz so that 
programs can be synchronized to the video frame rate. The 
horizontal drive signal from one frame buffer is connected to the 
external dock input on the KW11-P. The KW11-P is programmed to 
Interrupt every 262 horizontal drives (one frame time). 
Interrupt routines in the VAX perform functions such as counting 
frames, switching digitizer inputs, and switching frame buffers 
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between acquire mode and aomputor access mode, 

3. IMAGE UTILITIES 

Tho routines described in this section are low-level 
utilities which provide computer aocess to image data and some 
aimplo graphics capabilities for displaying tho results of image 
processing. 

The following standard arguments are used by tho routines 
described in the following seotions. 

I - pixel column address, 0<I<319. 

J - pixel row address, 0<J£239. 

G - pixel grey level ^ 0<G<255. 

The notation 0(1. J) stands for the gray level of pixel 

(I.J). 

FB - frame buffer number (0, 1, ...), 

CAM - camera number (1,2, 

LBUF - line buffer (320 byte array) which stores one line of 
Image data. 

Additional arguments are described as they occur in the text. 

3.1 Initialization 

Vision system peripherals on the QBUS are programmed by 
direct access to device registers to minimize software overhead. 
This requires mapping the corresponding UNIBUS I/O page into the 
program virtual address space. This represents the minimum ini- 
tialization required to use the vision system. Additional ini- 
tialization consists of programming peripherals with a default 
set of parameters. The Initialization routines are: 


7 


FUNCTION MAPIO { INTEGER 


performs the UNIBUS I/O page mapping. 

Sueoess or failure is indicated by an odd or even 
return value, respectively. 

PROCEDURE INITVISION 

oallti MAPIO and then programs 

peripherals (digitizer, framebuffers; IMFEX) to a 

standard initialization state. 

3.2 Frame Buffer I/O ' 

These routines transfer image data between the VAX and the 
Datacube frame buffers. 

FUNCTION 0ETPNT(FB, I, J) {INTEGER 

returns the gray level of pixel (I, J) in frame buffer 
FB. 

PROCEDURE GETLINE(FB, J.LBUF) 

loads image data from line J of frame buffer FB into 
the array LBUF. 

FUNCTION GETNEXT(FB) {INTEGER 

returns the gray level of the n next n location in frame 
buffer FB; i.e., after GETPNT (FEiI^J), successive 
calls to GETNEXT return G(I+1,J), G(I+2, Upon 
reaching the end of a line, the next pixel location is 
(0,J+1). 

PROCEDURE PUTPNT(FB,I,J,G) 

stores gray level G in looation (I, J) of frame buffer 




FB. 


PROCEDURE PUTLINE(FB, J, LBUF) 

copies the array LBUF to line J in frame buffer FB. 

PROCEDURE PUTNEXT(FBjG) ; 

used to write successive frame buffer locations (see 
remarks under GETNEXT) . 

3.3 Disk I/O 

Images may be stored in disk files and retrieved later with 
the routines described below. Disk I/O is strictly sequential on 
a line by line basis. Random access to input files can be 
achieved by first copying the image to a frame buffer. Note that 
these routines will operate on arbitrary sized images. 

The following arguments are used by these routines: 

FCB - pointer to a file control block. 

Several files may be opened simultaneously, each with 
its own FCB. 

NAME - string argument specifying file name. 

TYPE - string argument specifying 'new* or ’old* (existing) 
file. 

NL - number of lines in the image. 

NC - number of columns in the image. 

FUNCTION IMAGE0PEN(FCB, NAME, TYPE, NL, NC): BOOLEAN 

opens an image file for input (•old*) or output 
(•new'). Returned value indicates success (true) or 
failure (false). For new files NL and NC are defined 
by the calling program. For old files, IMAGEOPEN 
assigns the values stored in the image file header to 
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NL and NC. 


FUNCTION LINEIN(FCB t LBUF): BOOLEAN 

reads the next line of an 'old' image into LBUF. The 
oalling program must keep traok of line numbers. 
Returns true if the operation is successful. Returns 
false if an I/O error occurs such as an attempt to read 
beyond the end of file, or if the file type is 'new'. 

FUNCTION LINEOUT(FCB,LBUF): BOOLEAN 

writes the next line of a 'new* image into the disk 
file. The calling program must keep track of line 
numbers. Returns true if operation is successful. 
Returns false if an I/O error occurs or if the file 
type is 'old*. 

IMAGECLOSE(FCB) 

closes an image file. If the file type is 'new', this 
procedure writes the file header recording the actual 
number of lines written in the file. 

3-4 Camera Selection 

SELCAM(CAM NUM, FB) 

ssleots camera CAMNUM (currently 1, 2 } or 3) as input to 
digitizer, and puts frame buffer FB in acquire mode to 
store digitized video. CAM NUM=0 freezes frame buffer 
FB after allowing at least one full frame to be stored. 

3.5 IMFEX Control 

The routines described below are called to program IMFEX. 

IMFOUTFUT ( DISPCODE) 
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seloota raw imago data or ono of the processed outputs 
to be passod to the frame buffer. For oonvenionco. the 
display codes aro defined as symbolic constants: 


0 - IHFJtAW 

1 - IMF_GRAD 

2 » IHF_TH1N 

3 ■> IKF_TURESH 

4 - IHF_LUT 

5 - IMF_BLOB 

6 - IMF BLOBLUT 


raw image, 
gradient, 
thinning, 
thresholding, 
lookup table, 
thresholds d video. 

LUT applied to thresholded video. 


IMFTHRESH(T) 

sets threshold in thresholding unit to T } 0<T<255. 


LOADLUT( TABLE) 

loads the IMFEX lookup table from the array TABLE- 
TABLEti] = 0 or 1, 0<i<511. 


READLUT(FNAME. TABLE) 

reads a lookup table from an ASCII disk file FNAME into 
the array TABLE. 


3.6 Cursor 

A cursor is generated in hardware and mixed with video 
displays as a cro33 hair. The cursor may be displayed at an 
arbitrary location I,J to indicate the performance of a program 
such as an object tracker. The cursor may also be positioned 
manually (keyboard, Joystick) and its position returned to a 
program allowing the operator to designate points or regions of 
interest. There is also a software blinking cursor which is 
displayed as a 5x5 alternately written into the frame buffer 
as black (0) and white (255). 


PUTCUR(I,J) 

display the hardware cursor at image location (I, J). 
CURSOR (I J[ * FB] ) 

allows operator to position the cursor (software or 
hardware) at a desired locatioa Typing "escape" term- 
inates the procedure with the location returned in I,J, 
The optional FB argument specifies which frame buffer 
to use for the software cursor. The default value is 
0 . 


3.7 Graphics 

Graphics routines are used to display the results of image 
analysis routines (for demonstration and debugging purposes). 
Additional displays are generated as necessary using 
PUTPNT/PUTLINE. 

DRAWLINE(FB, II , J1 , 12 , J1 , G) 

draws a "straight" line between image points (II, Jl) 
and (12, J2) in frame buffer FB. The line is displayed 
at intensity G. 

PUTCHAR(FB.C f I,J[,G3) 

displays a single character C at I, J in frame buffer 
FB. Eaoh character is generated in a 5x7 dot matrix. 
I.J refers to the center of the matrix. 

PUTSTRING(FB, S, I, J[ ,G] ) 

displays a string of characters in frame buffer FB 
starting at image location I, J, 


a v . .k. i . 


3.8 Image Utility program 

The program PICTURE is a stand-alone utility program which 
allows one to interactively perform a variety of image-related 
operations. Commands are provided to make individual calls to 
many of the routines described in this chapter (e.g., SELCAM, 
CURSOR, DRAW LINE, IMFOUTPUT, eta) allowing convenient and flexi- 
ble control of the vision system for test purposes. The program 
is also useful for saving Images (real or artificial) to be used 
as control data sets during program development and debugging. 

The program provides access to image data in one of the 
frame buffers, a disk file, or an internal "picture" array. The 
picture array buffers image data in floating point format; i.a, 
intensity levels are represented as single precision floating 
point numbers (32 bits). Operations such as smoothing, adding 
random noise, and generating artificial images are performed on 
the picture array using floating point arithmetia 

Each command is a single-character, including a help command 
("?") which displays a menu of the available commands on the 
terminal. Some of the functions provided enable the operator to*. 

1. Move data between any two storage areas (frame buffer, disk 

file, picture array). 

2. Select and freeze (unfreeze) a frame buffer. 

3. Seleot digitizer video input. 

4. Program IMFEX. 

5. Add random noise to an image. 

6. Generate an artificial image. 
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4. FEATURE EXTRACTION 


This section describes routines which extract image features 
based solely on the 2-D image array data, Routines of this type 
generally fall into one of two categories: clustering algorithms 
suoh as region growers and convolution algorithms such as edge 
detectors. Clustering algorithms extract regions from image;!! 
based on a property suoh as intensity which is common to 
neighboring pixels. Convolution algorithms transform the image 
array (to enhance contrast edges for example) by performing noma 
operation defined for a small window (3x3. 5x5. etc.) centered at 
each pixel in the image. Also included in this chapter is a 
stereo correlation algorithm whch i3 essentially a convolution 
operation (gray level correlation) although the operation of the 
algorithm is constrained by the 3-D camera model. 

4.1 Blob Extraction 

The routine BLOBEXT scans a binary image (obtained by 
thresholding) to produce a tt blob tree” description of an image. 
Each node in the blob tree represents a connected region of 
uniform color (black or white). The hierarchy of the tree 
represents containment; i.e. ; nodes at level K represent blobs 
which are completely surrounded by the blob represented in the 
parent node at level K-1. Figure 4-1 shows a simple binary image 
and the corresponding blob tree. The approach used here is 
similar to that of the SRI Vision Module [2). 


BLOB TREE 


Figure 4-1. Blob Tree Representation of Binary Image 
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Several statistical descriptors are obtained for each blob 
and stored in the corresponding node of the blob tree: 

Area 

Perimeter 

1st moments (X}l,]Cj) 

2nd moments (X)l 2 , ]ClJ, £j 2 ) 

Minimum enclosing rectangle (Imin, Imax, Jmin, Jmax) 

Number of holes (blobs bordering and surrounded by this 
blob) 

Color (black/white) 

BLOBEXT also produces a run-length encoding of the image. 
This consists of a table of run-length record lists, one list per 
line of the image. Individual run-length records represent 
chains of adjacent pixels which are all white or all black. 
Specifically, a run-length record contains 3 pieces of 
informa tion: 

1, left-most column address I. 

2, length of the run. 

3, color, 

4.2 Region Grower 

The region grower routine REGGROW is based on the "blooming" 
region grower described in [3], Each call to the region grower 
extracts a single region represented by a record similar to a 
blob tree node defined above. 

The region grower is called with the coordinates of a "seed" 
point which by definition belongs to the region. Each of the 4 - 
or 8 - connected nearest neighbors are placed in a candidate 
queue. Candidates are removed from the queue one at a time. If 
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the candidate passes an aooeptance test (see below); the point la 
added to the region and Its unprocessed neighbors ore placed In 
the queue. Processing continues In this manner until the queue 
is empty. The region grower Is constrained to a user-selectable 
reatangular window defined by the initialization routine INITGROW 
Which must be called before the first call to REGOROW. REGGROW 
may be called multiple times to extraot several non- overlapping 
region records in a given window. 

An argument in the REGGROW oall selects one of the four 
following acceptance tests; 

1. G(I,J) < Threshold 

2 G(J,;J) > Threshold 

3 Aocept point (I,J) as long as G oax - G min < Threshold 

J|. Aocept point (l,j) if (0(1, J) -/i(l,J)) <t 

a 2 U,J> 

where (i,j) and (i,J) represent the local estimates of 
the average and standard deviation of gray levels inthe 
region. 

4.3 Edge Detection 

This section describes the processing performed by IMFEX. 
While these algorithms could be done in software, they are 
currently implemented in hardware, (The hardware algorithm runs 
in real time compatible with the camera frame rate of 60 hertz, 
while the software implementation is much slower.) A description 
is included here for completeness. 


GRADIENT The gradient operator caloulateo the magnitude of 
the average gradiont at oach pixel and a coarse ostiraato of 
edge orientation, Two three by throe convolutions are 
performed in parallel and summed, The absolute value of 
this sum represents the average gradiont. The oriontation 
is estimated by the equivalent of an arotangent operation 
(see Fig.lJ-2). 

THINNING The thinning algorithm uses non-maximum 
suppression to eliminate edges by comparing the average 
gradient of eaoh pixel to its two nearest neighbors 
orthogonal to the gradient orientation. The output of this 
step at each pixel is either the input gradient or zero. 

THRESHOLDING This step oonverts the array of gradient 
values to a binary edge map by comparing to a threshold. 
The threshold is set in software using the routine 
IMFTHRESH. In an alternate mode of operation, the raw video 
can be thresholded to produce a binary image. 

TABLE- LOOKUP The binary edge map (or image) can be further 
processed by a lookup-table algorithm. The nine oells of a 
3x3 window are ordered so that each combination of edge/no 
edge (1/0) data corresponds to a unique 9-bit binary 
number (0-511), This number is used to address a 512 x 1 
memory. The value stored in the addressed location replaces 
the center pixel of the corresponding 3x3 window. The 
lookup table can be programmed to remove isolated edges, 
fill single pixel gaps, perform further thinning, and 
deteot patterns such as vertices. 
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GRADIENT MAGNITUDE 


GRADIENT ORIENTATION 



THINNING 



\ 


IF MAX { x, a, b } = x THEN OUTPUT x ELSE OUTPUT 0. 
(SELECT a, b ACCORDING TO ORIENTATION OF x) 


LOOKUP TABLE 


1 

a 

b 

c 

h 

1 

d 

9 

f 

e 


abcdefgh 



0 OR 1 


Figure 4-2. IMFEX Processing: Gradient Magnitude 

and Orientation, Thinning, Lookup Table 
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Storoo correlation 


The stereo correlation routine CORBEL is used to obtain 3-D 
measurements of soleotod points in a scone [1{]» Assume that a 
point on the surface of an object is visiblo in both cameras at 
image points and (I 2 ,J 2 ) respectively. Given one of the 

imago points, say CORREL searohos for the conjugate 

point (l 2/ J 2 ) itl th0 ether image using correlation of gray lovols 
as the matching function. 

The search for a match is restricted to points along a line 
ssgment as shown in Figure h— 3. The search segment is defined by 
plaolng upper and lower bounds on the a priori estimated range to 
the objeot and projecting the corresponding segment of the ray 
R(I^,J>|) into the second image. Using a programmable sampling 
mask, the second image is sampled at each point along the search 
segment. Each sample is correlated with a spatially equivalent 
sample obtained from the first image using the standard 
statistical correlation coefficient formula. The correct match 
is identified as the point whioh maximizes the correlation 
coefficient. 

Before beginning the search, an autocorrelation test is 
performed in the neighborhood of (I^Ji). \e purpose of this 
test is to determine if there is a well defined peak which will 
prodU' o an unambiguous local maximum correlation value in the 
second image. If the autocorrelation test fails, the mask is 
increased in size and the test repeated. This is done at most N 
times (N = 3 typically) and if the test fails in all cases, 
CORREL returns a failure oode without attempting to find the 
matching point. 

Assuming the CORREL is successful, the 3-D coordinates of 
the point are obtained by calling INISCT (section 6). 





5 . CAMERA CALIBRATION 



This section describes the camera model we use to moke 3 -D 
measurements, and the procedure for calibrating the cameras. 

The cameras are calibrated empirically by viewing several 
known 3-D points and reoording the coordinates of the 
corresponding image points. These data are then used to solve 
for the oamera model parameters using least-squares techniques. 

One advantage of this approach is that it is not necessary 
to have prior knowledge of oamera parameters such as the 3-D 
location of the lens center, orientation of the principal axis, 
focal length, and the looation of the image plane in the 
reference 3-D coordinate frame. This is a very important 
consideration when using non-photogrammatic cameras since direot 
measurement of these parameters can be extremely difficult to 
obtain. Another advantage relates to 3tereo matching. Using the 
camera model, the line in one image plane containing all possible 
match points for a given point in the other image can be deter- 
mined without requiring that the image planes be coplanar with 
precise line to line registration. 

The accuraoy of the camera model is tested by comparing the 
observed image coordinates of a known 3-D point to the values 
computed with the calibration parameters. Typically the average 
error is 0.1 pixel with a worst case error of O.JJ to 0.5 pixel. 
These error estimates reflect sub-pixel accuracy measurements 
obtained by oaloulating the centroid of a region covering approx- 
imately 100 pixels. However, they indicate at most a 1/2-pixel 
error for discrete measurements based on a single pixel. 
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5.1 Camora Model 


The oamera model Is based on a standard perspective 
transformation whiah projocta points in 3-D apaae through a 
single point, the lens center, onto the image plane [1|J, The 
form that we use is defined by two equations which give the 
horizontal (I) and vertical (J) components of the projection of a 
3-D point X: 

(X-C ,H) 

I - ; 

(X-C ,A) 

(X-C.V) 

J M — ' 

(X-C, A) 

where C is the position of the lens center in a reference 3-D 
coordinate system, A is the principal axis of the lens, H is the 
"horizontal 11 vector and V is the "vertical" vector. Note that 
(X~C,H) represents the dot product of the vectors X-C and H. 

The geometrical interpretation of these vectors is 
illustrated in Fig. 5-1. C is expressed in mm, A is a unit 

vector, and H and V are scaled so that I and J are in "pixel" 

units. Figure 5-1 is simplified ty showing the image coordinate 

origin in the center (i.e., the projection of A) and showing A, H 

and V as mutually orthogonal. The image origin is actually 
located at the upper left corner to agree with frame buffer 
addressing with the result that H and V are skewed to compensate 
for the translation. 
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Figure 5-1, Camera Calibration Model 


5.2 Calibration Solution 


The camera model is obtained by the program SOLVECAL which 
oomputes values for the vectors C, A,H,and V. The input to this 
program is a list of N measurements P k = (I k ,J k> x k 1 » x k2» x lc3^ » 
each representing the observed image (I k .J k ) of a known 3 -D point 
The minimum number of measurements required is 6, However, 
we typically obtain 100 to 200 measurements and solve for the 
calibration vectors using least-squares techniques. Also the 3-D 
points may not all be coplanar. Each X k is used to write two 
equations based on the defining equations given in the previous 
section: 


Ix k1 a k1+ Ix k2 a k2 +Ix k3 a k3“ I ( c > A )“ x k1 h k1’’ x k2 h k2“ x k3 h k3 + ( c » H ) a 0 
Jx k1 a k1 +Jx k2 a 12 +Jx k3 A k3" J ( c » A )" x k1 v k1“ x k2 v k2" x k3 v k3 + ( c,v ) = 0 

These equations are obtained by rewriting the equations on I 
and J 30 that all torms appear on the left hand side, and then 
expanding the dot products. In this manner we obtain a 
homogeneous system of 2*N equations in the 12 unknowns 

a 1 , a 2 , a 3 , ^ , h 2 , h 3 , v 1 , v 2 , v 3 , ( C, A) , { C, H) , ( C, V) . 

J.n order to solve this system of equations, it is necessary 
to assign an arbitrary value to one of the unknowns so that we 
get a nonhomogeneous system of 2*N equations in 11 unknowns. For 
this purpose, the operator selects one component of A to be 
assigned the value 1. The selected component should be the one 
which appears to be closest to 1 based on the orientation of the 
camera with respect to the reference 3-D coordinate frame. 

After the initial system of equations is solved, the program 
solves for C using the following 3x3 system of equations: 


oi a-j + ©2^2 + G 3 a 3 H 
°lhj + 02h2 + C3I13 = (C,H) 

01 V) + 02V2 + 03 V3 s (C,V) 

Finally A, H and V are scaled by 1 / 1 A | with the result that 
the final value of A is a unit vector. 

The solution is verified by comparing the observed I^'s and 
j^'s with the values predicted by the camera model: 


Ik * " 


(*k ~ C, H) 

( x k A) 


Jk = Jk 


( X k " C» A) 


If either I 1^1 or I JjJ exceeds a threshold, P^ is removed 
from the data set, and a new solution is obtained. This oyole is 
repeated until all errors are less than the operator specified 
threshold, typically around 1, 

5.3 Calibration Data Acquisition (CALDATACQ) 

The camera calibration data acquisition program uses the 
region grower (Section 4.2) to locate a target attached to the 
end of a PUMA-600 robot. The target may be any easily detected 
figure whose centroid position is known relative to the position 
of the PUMA. We use a black disk drawn on white paper attached 
to the center of the flange at the end of the robot. The 
centroid of the disk coincides with the position of the robot 
calculated by VAL, the PUMA controller. 
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CALDATACQ moves the PUMA through a sequence of 128 locations 
so that the target is positioned at each point of two 8x8 grids 
contained in distinot planes. At each position, the target 
location x, y, z and the corresponding image centroid obtained 
by the region grower are recorded in a disk file which is input 
to SOLVECAL later. The target grids are defined interactively 
as described below. Data acquisition then proceeds 
automatically. 

fiBIE DEFINITION 

The operator is prompted to move the PUMA to four locations 
so that the target appears near the upper-left, upper-right 
lower-right and lower-left corners of the image. At eaoh 
location, the operator positions a cursor on the target and the 
program records the aursor position (I,J) and the target position 
(x,y,z). Thi 3 defines two quadrilaterals, one in the image plane 
and one in 3-D space. 

The program then effectively defines two corresponding 8x8 
grids, one in 3-D space and one in the image plane, by 
calculating 8 equally spaced points including end points on each 
side of the quadrilaterals defined by the two sets of four oorner 
points. The grid points are determined by the intersections of 
lines connecting corresponding points on opposite sides of the 
quadrilaterals. 

DATA ACQUISI TION 

The data acquisition loop moves the robot to position the 
target at each 3-D grid point. The program then attempts to 
locate the target by sampling the image at the corresponding 
image grid point. If the sampled point is black, it is assumed 
to be in the target disk. If the point is not black, a spiral 
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search is initiated which terminates successfully when a black 
point is located or unsuccessfully if an error bound of 10 pixels 
is exceeded, in which case the operator is prompted to position a 
cursor on the target. The region grower is then called starting 
at the point determined {by one of the above means) to be on the 
target to locate the entire di3k and calculate its centroid. 

Bach tine the region grower finishes the operator is 
prompted to accept the result or try again with a higher or lower 
acceptance threshold in the region grower. 

The grid definition/data acquisition sequence is then 
repeated for the second 8x8 grid. The entire process may then be 
repeated for additional cameras a3 desired. 

5.4 CALIBRATION DATA FILE MANAGEMENT 

User programs requiring camera calibration data acquire it 
from files created and updated by the program NEWCAMCAL, The 
standard calibration file contains data for each of the cameras 
being used in the laboratory. A data record for each camera 
aontains the calibration vectors C, A, H and V, a name (i.e. 
"LEFT", "RIGHT"), and the image size. 

NEWCAMCAL is normally run after SOLVECAL to update the 
standard calibration file. It can also be used to create 
alternate files. Some uses of the latter capability are 1) 
generating artificial calibration data for test purposes, and 2) 
creating calibration files corresponding to images taken by a 
camera other than the ones in our lab. 

User programs access calibration data by calling the 
procedure SETCAMCAL which loads the standard calibration file or 
SETCAMCALUSER( FILENAME) which loads an alternate calibration file 

as specified by the FILENAME argument. 
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6. COORDINATE TRANSFORMS 


The routines described in this section perform a variety of 
coordinate transform functions involving the camera calibration 
model, SETCAMCAL or SETCAMCALUSER must be called to bring 
calibration data into the program before oalling any of these 
routines. 

The two basic functions are XYZToImage which computes the 
image (I,J) of an arbitrary 3-D point X s (x,y,z), and ImageToRay 
which calculates the ray R(I,J) containing all 3-D points whose 
image is (I,J). Additional functions are defined by imposing 
various 3-D constraints. 

These routines use the standard arguments listed below. 
Reals are single precision. 

I = image column coordinate (real) 

J s image row coordinate (real) 

CAM = camera number (integer 1,2,...) 

R f X, P, N a 1-dimensional array of three reals (x,y, z) 

XYZToImage (CAM,I,J,X) 

calculates the projection (I, J) of the 3-D point X into 
camera CAM. 

Image ToR AY (CAM,I,J,R) 

calculates a unit vector R representing the projection 
of image point (I, J) in camera CAM into 3-D space. The 
ray defined by R denoted by R(I,J) contains all 3-D 
points whose image is ( I, J). 

Image ToXYZ( CAM, I, J, CONSTRAINT, X[ , P, N] ) 

calculates the position X of a point Imaged on (I,J) in 


camera CAM given a CONSTRAINT (integer 1-4) which 
specifies a plane containing the point X, The 
specified plane may be parallel to one of the three 
coordinate planes (xy,xz,yz) or it may be an arbitrary 
plane defined by a point P in the plane and the normal 
to the plane N (unit vector). Specifically a 
CONSTRAINT of ksl, 2 or 3 specifies the plane X[k] = 
CONSTANT with the actual value stored in X[k] by the 
oalling program (i.e. , if k=3, then X is in the plane 
ZsX[k] parallel to the xy-plane), One use of this 
routine is to measure the position of an object resting 
on a level surface at a known height (z - coordinate). 

If CONSTRAINT = 4, then P and N are defined by the 
calling program. The defaults are P = (0,0,0) and N = 
(0,0,1) whloh is equivalent to CONSTRAINT = 3, XC3] = 
0. P and N are ignored if CONSTRAINT s 1,2 or 3. 

INTSCT ( CAM1 , CAM2 , II ,J1 ,I2,J2,X) 

is used to make 3-D measurements based on stereo trian- 
gulation. (I1.J1) and (12, J2) are the images in CAM1 
and CAM2 of a oommon 3-D point. The coordinates of 
this point are returned in X. The oomputed value of X 
corresponds to the midpoint of the segment which 
minimizes the distance between the two rays R(H,J1) 
and R(I1, J1). 

Image ToImage(CAM1,CAM2, II ,J1 ,12, J2,M0DE,D) 

Given an image point (I1,J1) in CAM1, returns the 
coordinates (12, J2) in CAM2 of the projection of a 
point P on the ray R(I1,J1). The point P may be speci- 
fied in one of two ways according to the arguments 
MODE(integer) and D(real) as follows: 
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MODEaO - P is at distance D from CAM1 along R(I1,J1). Note 
that if P projoots outside of the CAM2 image, then 
12 will be coerced to 0 or 319 as appropriate with 
{12, J2) representing the projeotlon of an 
alternate point P> on RCl^j^), 


M0DE=1 




P is defined to be the point on R(I^,J-|) whose 
image is (D,J2). 


This routine is used by the stereo correlation routine 
CORREL to define the searoh segment for stereo matching (see 
Section 4.4) . 


7. TRACKING AND ACQUISITION 


This sootion describes three objeot traoking programs and an 
objeot acquisition program. The first two programs (correlation 
tracker 4 label traoker) are based on fairly simple algorithms 
whioh execute in real-time (60 Hz). The third program traoks 
known 3-D objects using a robust model-based approach whioh 
oomputes six components of position and velooity (linear and 
angular) and tolerates noise and/or partial occlusion. The 
acquisition program is used to initialize the latter traoking 
program. The front end of the acquisition program is a real-time 
vertex traoker. Using rigid body constraints, the vortex 
position and velocity estimates lead to a model-matohing 
procedure whioh establishes the position and velooity of the 
object to initialize the traoker. 

Objeot tracking is an essential component of visual feedback 
in dynamically changing environments. Some applications are 
grasping moving objeots or performing assembly operations on 
moving objects with a robot arm; real-time ob3taole avoidance and 
landmark navigation by an autonomous vehiole; and spacecraft 
rendezvous and docking or berthing operations. The basic 
requirement in such applications is fast processing. For 
standard TV oameras, the maximum rate is 30 Hz (standard RS-170) 
or 60 Hz (non-interlaced). 

Object traoking algorithms based entirely on software must 
be kept simple to execute at video rates on a general purpose 
computer. The correlation tracker and the label traoker fall 
into this category. More robust algorithms such as the 3-D 
acquisition and traoking algorithms will require much more 
computational power to execute at video rates. This includes 
powerful image processing hardware and parallel array processors 
for mathematical computations. IMFEX is an example of the kind 


of imago proooasing hardware nocossary for real-time vision 
programs. 

7.1 Correlation Tracker 

The correlation traokor is a library module which oan bo 
linked to any application program. The tracker is initiated by 
the routine 

CTRACK(I, J, CAM, FBI ,FB2,FCTR) 

I,J - image coordinates (row, column) of object, 
initialized by ooller, updated by tracker. 

CAM - camera number (1,2,...) 

FBI , FB2 - frame buffers to be used (0,1,...) 

FCTR - frame counter incremented by traoker at 60 Hz. 

Traoking is accomplished by correlating successive images 
with the initial gray level sample centered at the point (I,J) in 
the call to CTRACK. The sear oh in eaoh image is centered at the 
next predicted location which is the last observed location plus 
I,J offsets based on velooity estimates. Velocity is estimated 
independently in the I and J directions as the slope of a least 
squares fit line using the last five observations. 

The correlation sampling mask consists of the 25 points on 
the horizontal, vertical and diagonal blseotors of a 7x7 square. 
The search consists of 25 samples centered at the points of a 5x5 
square which in turn is centered at the predicted I,J. The 
updated I, J represents the point where the correlation is 
maximized. 

A new iteration of the tracker 1s initiated every 1 / 6 0 
seoond using the KW11-P interrupts. Images are alternately 
frozen in FBI and FB2 to access images at 60Hz. Two frame 


buffers aro roquirod for this sohorao since computor aooosis is 
blookod while imago data is being storod in the frame fiuffor. 

The aotual traoking procedure is an Asynchronous System Trap 
routine (AST) invoked by the KW11-P interrupt routine. The 
variablo FCTR is also incremented by the KW11-P interrupt routine 
so that the oalling program oan synchronize processing with the 
tracker. This routine was used to demonstrate real-time vision- 
based navigation of the JPL Mars novel test vohiole [53* 

7.2 Label Traoker 

The label traoking program was developed for the general 
case of providing 3-D vision feedbaok to a robot arm manipulating 
a moving object [6]. We are using it to oonduct laboratory 
experiments in traoking and capturing a moving satellite with the 
PUMA robot [7], 3-D measurements obtained by the traoker at 30 
Ha are used to estimate the trajectory of the satellite. The 
PUMA is commanded to follow a similar trajeotory which keeps the 
end effector poised near a grapple fixture on the satellite. 
This continues indefinitely until the operator issues the grapple 
oommand. Upon receiving this oommand the PUMA begins to approach 
the satellite until the end effeotor attaohes to the grapple 
fixture. The satellite is then slowed gradually until it oomes 
to rest, PUMA control software resides in a PDP-11/34. A DR11-W 
parallel interface on the VAX is used for communication to the 
PDP- 11/34. 

The traoking algorithm requires three labels arranged as the 
vertices of a right trangle at known locations on the objeot 
(Figure 7-1). In the satellite oapture experiment the labels 
are white disks on a black background. The centroids of the 
three labels define the x and y axes of an objeot centered 
coordinate frame, with the z-axi3 obtained by their cross 
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product. The position and orientation of the object are 
represented by a homogeneous coordinate transformation matrix 
°^ T base (read "object" with respect to "base."). This 
representation is used to desoribe the location of the object 
since it also is the representation used to specify the desired 
looation of the end effector ^ anc *Tb aae in our robot control 
software. In general, the desired end effector position will be 
displaced from the origin of the objeot (one obvious reason is to 
avoid oodusion of the labels by the robot), This displacement 
is represented by a third coordinate transformation h and T 0 bj. 
The desired end effeotor position is calculated as 

hand-n. _ hand T , , , objrp. 

A base ~ i obj A base 

The sateilit*' tracking program uses two frame buffers (0 and 
1 ) to acquire a new stereo pair of images every I /30 seoond. 
Image acquisition is controlled by a KW11-P interrupt routine as 
in the correlation tracker. The processing of each image 
consists of thresholding three search windows centered at the 
predicted label locations. The image coordinates of "white" 
pixels are summed ( 2 *,I, -Cj) to calculate the centroid of the 
label. Corresponding centroids from the two images are passed to 
INTSCT to obtain the 3-D location of the label. The 3-D 
measurements are smoothed by a third-order recursive filter. The 
smoothed coordinates are then used to calculate ob ^T base . 

Operator interaction for initialization consists of 
entering a value for the filter characteristic time constant 
(typically 7 ) and designating the initial label locations in each 
image with a cursor. 

7.3 3-D Objeot Tracker 

The 3-D objeot tracking program TRACKOBJ tracks known 


36 


objects represented by wire-frame models [8], The output of the 
tracker is a complete description of the position and velocity of 
the objeot — three linear and three angular components. In its 
present form, the traoker only models convex pclyhedra. The 
modelling capability will be extended to include aonoave 
polyhedra and objects with cylindrical surfaces. 

The tracking algorithm consists of four steps: PREDICTION, 

PROJECTION, MEASUREMENT, and ADJUSTMENT. A simplified 
description of eaoh step is given below: 

1) PREDICTION - noting the elapsed time between the current and 
previous images, the expected position and orientation of 
the objeot are calculated. 

2) PROJECTION - visible edges of the object are projected into 
the image plane. 

3) MEASUREMENT - the IMFEX edge array is searched to locate 
edge elements in the vicinity of the projected edges. The 
discrepancy between observed and predicted edge locations is 
recorded each time an edge is found. 

4) ADJUSTMENT - edge location discrepancies are processed by a 
least squares adjustment to generate new position and 
velocity measurements. 

Important features of the tracking algorithm include 

1) significant noise is tolerated. 

2) missing edges, due either to poor contrast or occlusion can 
be tolerated. 



3) the algorithm will work with any number of cameras inoluding 

only one. 

The program is initialized by the operator who enters the 
Initial position and orientation of the objeot, A menu list3 
other parameters that oan be modified inoluding the number of 
oameras to U3e, the edge threshold to set in IMFEX, the object 
model file name specification, and the values of tuning 
parameters, A menu of parameters is displayed to aid the 
operator. All parameters have default values, 

7.4 Acquisition 

The acquisition program finds a known moving object to 
initialize the objeot tracking program described in the previous 
section [ 9 ]. Together, the acquisition and tracking programs 
form a system capable of detecting and tracking known 3-D 
objects. 

The acquisition algorithm represents a special case from the 
general domain of 3-D object recognition, and i3 distinguished 
from the latter primarily by the fact that the identity of the 
objeot is known a priori. Also the acquisition algorithm relies 
on motion cues to extract 3-D information (in fact the objeot 
must be moving for the acquisition algorithm to work). 

As shown in Figure 7-2, the acquisition program consists of 
five modules which are partitioned to run as two concurrent 
processes. One process consists of the vertex tracker which 
extracts vertices from the binary edge output of IMFEX and 
continuously updates the 2-D (image) position and velocity of 
detected verticea The other process consists of the remaining 
four modules which do the 3-D analysis resulting in an estimate 
of the object's position, orientation and velocity which are used 
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to initialize the objeot traoker. Notice that data from the 
vertex traoker enters the other process at two points. The input 
to the motion stereo module represents the initial input to the 
3-D analysis prooesa. Several seoonds elapse before a model 
match is obtained with these data. Thus the traoking initializer 
needs the latest information from the vertex traoker to obtain a 
current estimate of the objeot's position and velocity. A brief 
description of eaoh module is given below. 

VERTEX TRACKER : Vertices are represented as 5 x 5 binary edge 
patterns. The algorithm effectively does a table-lookup based on 
converting the 5x5 pattern into a 25-bit lookup table address. 
To avoid the prohibitive memory requirements of a 2 2 5-element 
table, the actual implementation uses a two-stage table-lookup 
scheme based on overlapping 3x3 patterns whloh fit in a 5x5 
window. The vertex tracker is interrupt driven to run at 60 Hz. 
When multiple cameras are being used, the tracker switches to a 
new camera at the beginning of each iteratloa 2-D position and 
velocity information is stored in a shared memory segment for 
direct acoess by the motion stereo and tracking initializer 
modules. 

M.Q.TXON STEREO : The motion stereo module accepts 2-D vertex 

position and velocity information from the vertex tracker. Using 
a rigid body assumption, the 3-D position of eaoh vertex is 
calculated. These measurements are assumed to have bias and 
scale factor uncertainties. 

.STEREO MATCHER: The stereo matcher matches vertices between the 
two cameras to obtain accurate 3-D position measurements. As a 
result, scale factor and bias uncertainties are reduced so that 
the position estimates for unmatched vertices are also improved. 
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MODEL MATCHER : Computed vertex positions are compared to the 
objeot model to determine the position and orientation of the 
objeot. 


TRACKING INITIALIZER : Using the latest 2-D vertex positions from 

the vertex tracker, this module performs a weighted least squares 
adjustment of the object position, orientation and veloolty. The 
result is passed to the objeot traoker. 

Each of the 3-D analysis modules has a sucoess/failure 
criterion. If a module fails, control is transferred back to the 
motion stereo module to make a new attempt. Likewise, if the 
object tracker fails, the acquisition program is started again to 
reacquire the objeot. 
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THE ACQUISITION PROGRAM 



Figure 7-2. Control and Data Flow of Acquisition System 
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